11.2 Markov Chains

127

make from a long, unbroken observation. The problem becomes particularly acute

when evidence for higher order Markov chains is sought, when the quantity of data

required might be unattainable. 8 An important result is Whittle’s formula giving the

distribution of the transition count:

N (n)

uv (F) = F

uv ,

(11.9)

whereupper N Subscript u v Superscript left parenthesis n right parenthesis Baseline left parenthesis upper F right parenthesisN (n)

uv (F) is the number of sequencesleft parenthesis a 1 comma a 2 comma ellipsis comma a Subscript n plus 1 Baseline right parenthesis(a1, a2, . . . , an+1) having transition count

upper F equals StartSet f Subscript i j Baseline EndSetF = {fij}, and satisfyinga 1 equals ua1 = u anda Subscript n plus 1 Baseline equals van+1 = v. The transition count together with the

initial state (with probability p Subscript a 1pa1) forms a sufficient statistic for the process, since

pa1pa1a2 . . . panan+1 = pa1

Π

ij

p

fij

ij ,

(11.10)

where the left-hand side is simply the probability of realizing a particular sequence

StartSet x 1 comma x 2 comma ellipsis comma x Subscript n plus 1 Baseline EndSet{x1, x2, . . . , xn+1}. For i comma j equals 1 comma ellipsis comma si, j = 1, . . . , s, f Subscript i jfij is the number of mm, with 1 less than or equals m less than or equals n1mn, for

whicha Subscript m Baseline equals iam = i anda Subscript m plus 1 Baseline equals jam+1 = j;upper FF is therefore ans times ss × s matrix, such thatsigma summation Underscript i j Endscripts f Subscript i j Baseline equals nΣ

ij fij = n and

such thatf Subscript i dot Baseline minus f Subscript dot i Baseline equals delta Subscript i u Baseline minus delta Subscript i v Baseline comma i equals 1 comma ellipsis comma sfi·f·i = δiuδiv, i = 1, . . . , s, for some pairu comma vu, v, wheref Subscript i dot Baseline equals sigma summation Underscript j Endscripts f Subscript i jfi· = Σ

j fij, and

StartSet f Subscript i dot Baseline EndSet{fi·} andStartSet f Subscript dot j Baseline EndSet{f·j} are the frequency counts ofStartSet a 1 comma ellipsis comma a Subscript n Baseline EndSet{a1, . . . , an} andStartSet a 2 comma ellipsis comma a Subscript n plus 1 Baseline EndSet{a2, . . . , an+1}, respectively,

from which f Subscript i dot Baseline minus f Subscript dot i Baseline equals delta Subscript i a 1 Baseline minus delta Subscript i a Sub Subscript n plus 1fi·f·i = δia1δian+1. In Eq. (11.9), upper F Subscript u v Superscript asteriskF

uv is the left parenthesis v comma u right parenthesis(v, u)th cofactor of the

matrix upper F Superscript asterisk Baseline equals f Subscript i j Superscript asteriskF= f

ij , with components

f

ij =

{δij fij/fi· if

fi· > 0

δij

if

fi· = 0 .

(11.11)

Problem. Prove that if upper PP is stochastic, then any power of upper PP is also stochastic.

The entropy of the transitions (i.e., the weighted variety of the transitions) can

be found from each row of the stochastic matrix according to Eq. (6.5). The (infor-

mational) entropy of the process as a whole is then the weighted average of these

entropies, the weighting being given by the equilibrium distribution of the states.

Hence, in a sense the entropy of a Markov process is an average of averages.

Problem. Consider the three-state Markov chain

right arrow

1

2

3

1

0.1

0.9

0.0

2

0.5

0.0

0.5

3

0.3

0.3

0.4

and calculate (i) the equilibrium proportions of the states 1, 2, and 3 and (ii) the

average entropy of the entire process.

8 See Billingsley, especially for the proof of Whittle’s formula, Eq. (11.9).